Multimodal Skipgram Using Convolutional Pseudowords

نویسندگان

  • Zachary Seymour
  • Yingming Li
  • Zhongfei Zhang
چکیده

This work studies the representational mapping across multimodal data such that given a piece of the raw data in one modality the corresponding semantic description in terms of the raw data in another modality is immediately obtained. Such a representational mapping can be found in a wide spectrum of real-world applications including image/video retrieval, object recognition, action/behavior recognition, and event understanding and prediction. To that end, we introduce a simplified training objective for learning multimodal embeddings using the skip-gram architecture by introducing convolutional “pseudowords:” embeddings composed of the additive combination of distributed word representations and image features from convolutional neural networks projected into the multimodal space. We present extensive results of the representational properties of these embeddings on various word similarity benchmarks to show the promise of this approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dependency Based Embeddings for Sentence Classification Tasks

We compare different word embeddings from a standard window based skipgram model, a skipgram model trained using dependency context features and a novel skipgram variant that utilizes additional information from dependency graphs. We explore the effectiveness of the different types of word embeddings for word similarity and sentence classification tasks. We consider three common sentence classi...

متن کامل

Modeling the Visual Word Form Area Using a Deep Convolutional Neural Network

The visual word form area (VWFA) is a region of the cortex located in the left fusiform gyrus, that appears to be a waystation in the reading pathway. The discovery of the VWFA occurred in the late twentieth century with the advancement of functional magnetic resonance imaging (fMRI). Since then, there has been an increasing number of neuroimaging studies to understand the VWFA, and there are d...

متن کامل

Visually Grounded and Textual Semantic Models Differentially Decode Brain Activity Associated with Concrete and Abstract Nouns

Important advances have recently been made using computational semantic models to decode brain activity patterns associated with concepts; however, this work has almost exclusively focused on concrete nouns. How well these models extend to decoding abstract nouns is largely unknown. We address this question by applying state-of-the-art computational models to decode functional Magnetic Resonanc...

متن کامل

Multimodal MRI brain tumor segmentation using random forests with features learned from fully convolutional neural network

In this paper, we propose a novel learning based method for automated segmentation of brain tumor in multimodal MRI images. The machine learned features from fully convolutional neural network (FCN) and hand-designed texton features are used to classify the MRI image voxels. The score map with pixelwise predictions is used as a feature map which is learned from multimodal MRI training dataset u...

متن کامل

Benchmarking Multimodal Sentiment Analysis

We propose a framework for multimodal sentiment analysis and emotion recognition using convolutional neural network-based feature extraction from text and visual modalities. We obtain a performance improvement of 10% over the state of the art by combining visual, text and audio features. We also discuss some major issues frequently ignored in multimodal sentiment analysis research: the role of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1511.04024  شماره 

صفحات  -

تاریخ انتشار 2015